Working with geographical data using DataFrames.jl¶
Minisimposium for tabular data, creation of this tutorial has been supported by the Polish National Agency for Academic Exchange under the Strategic Partnerships programme, grant number BPI/PST/2021/1/00069/U/00001
In [20]:
using OSMToolset # Pkg.add(url="https://github.com/pszufe/OSMToolset.jl", rev="main")
using DataFrames
using PyCall
using OpenStreetMapX
using Colors
using Plots
using SimpleValueGraphs
using OpenStreetMapXPlot
flm = pyimport("folium");
The boston.osm file was downloaded form [https://overpass-api.de/api/map?bbox=-71.1098,42.3521,-71.0502,42.3805]
You can used a smaller inbuilt map.osm file instead too (uncomment code)
In [21]:
#file = joinpath(dirname(pathof(OSMToolset)),"..","test","data","map.osm")
file = joinpath(dirname(pathof(OSMToolset)),"..","boston.osm")
tt = @elapsed df = find_poi(file)
map_data = get_map_data(file, use_cache=false, trim_to_connected_graph=true );
println("OSM file parsing speed: $(round(filesize(file)/(1024*1024)/tt,digits=1)) MB/s")
OSM file parsing speed: 11.4 MB/s
Attractiveness Spatial Index (ix) is built on the base of a DataFrame. Attractiveness of a location is evaluated by providing its coordinates.
In [34]:
ix = AttractivenessSpatIndex(df);
# Stata Center campus
lat=42.361732327506516
lon=-71.09043164955334
attract = attractiveness(ix, lat, lon)
Out[34]:
(education = 364.07988461350504, parking = 1.0575223826966949, shopping = 16.62025527806438, transport = 7.1895280000006405, healthcare = 3.477632654568438, entertainment = 47.89522092353563, leisure = 290.5967600088863, restaurants = 113.55682405754388)
explain=true makes it possible to understand how the value was built
In [23]:
attract = attractiveness(ix, lat, lon, explain=true)
attract[2]
Out[23]:
283×5 DataFrame
258 rows omitted
| Row | class | points | poidistance | lat | lon |
|---|---|---|---|---|---|
| Symbol | Int64 | Float64 | Float64 | Float64 | |
| 1 | education | 20 | 1152.68 | 42.3518 | -71.0864 |
| 2 | education | 5 | 1268.2 | 42.3522 | -71.082 |
| 3 | education | 20 | 2297.67 | 42.3522 | -71.0657 |
| 4 | leisure | 5 | 1063.19 | 42.3531 | -71.0849 |
| 5 | leisure | 5 | 1067.76 | 42.3531 | -71.0846 |
| 6 | education | 20 | 1725.15 | 42.3533 | -71.108 |
| 7 | education | 5 | 1427.18 | 42.3535 | -71.0771 |
| 8 | education | 20 | 2475.85 | 42.3538 | -71.0623 |
| 9 | leisure | 5 | 1450.25 | 42.3543 | -71.1049 |
| 10 | education | 20 | 2936.6 | 42.3545 | -71.0562 |
| 11 | education | 5 | 1394.8 | 42.3546 | -71.0765 |
| 12 | leisure | 5 | 1195.02 | 42.3547 | -71.0795 |
| 13 | leisure | 5 | 1272.5 | 42.3548 | -71.1027 |
| â‹® | â‹® | â‹® | â‹® | â‹® | â‹® |
| 272 | leisure | 5 | 1302.93 | 42.3726 | -71.0845 |
| 273 | leisure | 5 | 1383.67 | 42.3733 | -71.0841 |
| 274 | leisure | 5 | 1474.09 | 42.3733 | -71.0816 |
| 275 | leisure | 5 | 1406.36 | 42.3735 | -71.084 |
| 276 | education | 5 | 2326.52 | 42.3751 | -71.1122 |
| 277 | education | 3 | 1762.13 | 42.3774 | -71.0937 |
| 278 | education | 5 | 2728.09 | 42.3778 | -71.0654 |
| 279 | education | 5 | 2798.49 | 42.3783 | -71.0648 |
| 280 | education | 5 | 1968.07 | 42.3784 | -71.0985 |
| 281 | education | 20 | 2419.71 | 42.3785 | -71.0716 |
| 282 | education | 5 | 2653.18 | 42.3788 | -71.0679 |
| 283 | education | 5 | 1974.34 | 42.3789 | -71.0967 |
In [24]:
ix.df
Out[24]:
2576×10 DataFrame
2551 rows omitted
| Row | elemtype | elemid | nodeid | lat | lon | key | value | class | points | range |
|---|---|---|---|---|---|---|---|---|---|---|
| Symbol | Int64 | Int64 | Float64 | Float64 | String | String | String | Int64 | Int64 | |
| 1 | node | 69480814 | 69480814 | 42.357 | -71.0588 | public_transport | stop_position | transport | 5 | 300 |
| 2 | node | 69482188 | 69482188 | 42.3599 | -71.06 | public_transport | stop_position | transport | 5 | 300 |
| 3 | node | 69482993 | 69482993 | 42.3525 | -71.0549 | public_transport | stop_position | transport | 5 | 300 |
| 4 | node | 69487423 | 69487423 | 42.3736 | -71.0697 | railway | station | transport | 10 | 700 |
| 5 | node | 69487440 | 69487440 | 42.3654 | -71.1037 | public_transport | stop_position | transport | 5 | 300 |
| 6 | node | 69488191 | 69488191 | 42.3612 | -71.0713 | public_transport | stop_position | transport | 5 | 300 |
| 7 | node | 69488839 | 69488839 | 42.3568 | -71.0633 | public_transport | stop_position | transport | 5 | 300 |
| 8 | node | 69490284 | 69490284 | 42.3652 | -71.0603 | public_transport | stop_position | transport | 5 | 300 |
| 9 | node | 69490954 | 69490954 | 42.3518 | -71.0627 | public_transport | stop_position | transport | 5 | 300 |
| 10 | node | 69493810 | 69493810 | 42.3534 | -71.0644 | public_transport | stop_position | transport | 5 | 300 |
| 11 | node | 69504608 | 69504608 | 42.3588 | -71.0578 | railway | station | transport | 10 | 700 |
| 12 | node | 69504958 | 69504958 | 42.3667 | -71.0678 | public_transport | stop_position | transport | 5 | 300 |
| 13 | node | 69505365 | 69505365 | 42.3522 | -71.0626 | railway | station | transport | 10 | 700 |
| â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® |
| 2565 | relation | 10076749 | 300424152 | 42.358 | -71.0987 | leisure | track | leisure | 5 | 800 |
| 2566 | relation | 11201448 | 327171087 | 42.3575 | -71.0611 | amenity | university | education | 20 | 10000 |
| 2567 | relation | 11201473 | 1325838658 | 42.3538 | -71.0623 | amenity | college | education | 20 | 10000 |
| 2568 | relation | 11601161 | 328458380 | 42.3691 | -71.0722 | amenity | college | education | 20 | 10000 |
| 2569 | relation | 12661390 | 2623465830 | 42.3666 | -71.0831 | amenity | parking | parking | 5 | 250 |
| 2570 | relation | 13405301 | 4181829490 | 42.3665 | -71.0951 | leisure | park | leisure | 5 | 500 |
| 2571 | relation | 14205090 | 325403200 | 42.3802 | -71.0965 | amenity | bank | shopping | 1 | 750 |
| 2572 | relation | 14205091 | 327067372 | 42.38 | -71.0963 | shop | supermarket | shopping | 5 | 500 |
| 2573 | relation | 14205096 | 327524055 | 42.3798 | -71.0943 | amenity | restaurant | restaurants | 5 | 750 |
| 2574 | relation | 14205406 | 9784109000 | 42.38 | -71.0934 | amenity | parking | parking | 5 | 250 |
| 2575 | relation | 14205408 | 327175969 | 42.38 | -71.0956 | amenity | parking | parking | 5 | 250 |
| 2576 | relation | 15704864 | 10800012568 | 42.3551 | -71.1022 | leisure | garden | leisure | 5 | 500 |
In [35]:
df = deepcopy(ix.df)
vals = ENU.(LLA.(df.lat, df.lon),Ref(map_data.bounds) ) #Ref(ix.refLLA)
df.x .= getfield.(vals, :east)
df.y .= getfield.(vals, :north)
df
Out[35]:
2576×12 DataFrame
2551 rows omitted
| Row | elemtype | elemid | nodeid | lat | lon | key | value | class | points | range | x | y |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Symbol | Int64 | Int64 | Float64 | Float64 | String | String | String | Int64 | Int64 | Float64 | Float64 | |
| 1 | node | 69480814 | 69480814 | 42.357 | -71.0588 | public_transport | stop_position | transport | 5 | 300 | 1746.64 | -1035.44 |
| 2 | node | 69482188 | 69482188 | 42.3599 | -71.06 | public_transport | stop_position | transport | 5 | 300 | 1647.18 | -710.309 |
| 3 | node | 69482993 | 69482993 | 42.3525 | -71.0549 | public_transport | stop_position | transport | 5 | 300 | 2066.0 | -1529.94 |
| 4 | node | 69487423 | 69487423 | 42.3736 | -71.0697 | railway | station | transport | 10 | 700 | 846.854 | 813.538 |
| 5 | node | 69487440 | 69487440 | 42.3654 | -71.1037 | public_transport | stop_position | transport | 5 | 300 | -1955.96 | -100.232 |
| 6 | node | 69488191 | 69488191 | 42.3612 | -71.0713 | public_transport | stop_position | transport | 5 | 300 | 713.655 | -567.151 |
| 7 | node | 69488839 | 69488839 | 42.3568 | -71.0633 | public_transport | stop_position | transport | 5 | 300 | 1378.13 | -1056.26 |
| 8 | node | 69490284 | 69490284 | 42.3652 | -71.0603 | public_transport | stop_position | transport | 5 | 300 | 1619.06 | -125.945 |
| 9 | node | 69490954 | 69490954 | 42.3518 | -71.0627 | public_transport | stop_position | transport | 5 | 300 | 1423.73 | -1610.27 |
| 10 | node | 69493810 | 69493810 | 42.3534 | -71.0644 | public_transport | stop_position | transport | 5 | 300 | 1282.82 | -1430.66 |
| 11 | node | 69504608 | 69504608 | 42.3588 | -71.0578 | railway | station | transport | 10 | 700 | 1831.26 | -832.385 |
| 12 | node | 69504958 | 69504958 | 42.3667 | -71.0678 | public_transport | stop_position | transport | 5 | 300 | 1007.56 | 45.0045 |
| 13 | node | 69505365 | 69505365 | 42.3522 | -71.0626 | railway | station | transport | 10 | 700 | 1433.02 | -1564.24 |
| â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® | â‹® |
| 2565 | relation | 10076749 | 300424152 | 42.358 | -71.0987 | leisure | track | leisure | 5 | 800 | -1540.54 | -922.13 |
| 2566 | relation | 11201448 | 327171087 | 42.3575 | -71.0611 | amenity | university | education | 20 | 10000 | 1559.93 | -977.266 |
| 2567 | relation | 11201473 | 1325838658 | 42.3538 | -71.0623 | amenity | college | education | 20 | 10000 | 1454.71 | -1387.61 |
| 2568 | relation | 11601161 | 328458380 | 42.3691 | -71.0722 | amenity | college | education | 20 | 10000 | 640.989 | 308.822 |
| 2569 | relation | 12661390 | 2623465830 | 42.3666 | -71.0831 | amenity | parking | parking | 5 | 250 | -254.197 | 29.7408 |
| 2570 | relation | 13405301 | 4181829490 | 42.3665 | -71.0951 | leisure | park | leisure | 5 | 500 | -1247.5 | 26.5816 |
| 2571 | relation | 14205090 | 325403200 | 42.3802 | -71.0965 | amenity | bank | shopping | 1 | 750 | -1356.82 | 1540.61 |
| 2572 | relation | 14205091 | 327067372 | 42.38 | -71.0963 | shop | supermarket | shopping | 5 | 500 | -1341.31 | 1519.23 |
| 2573 | relation | 14205096 | 327524055 | 42.3798 | -71.0943 | amenity | restaurant | restaurants | 5 | 750 | -1175.96 | 1503.01 |
| 2574 | relation | 14205406 | 9784109000 | 42.38 | -71.0934 | amenity | parking | parking | 5 | 250 | -1100.23 | 1522.57 |
| 2575 | relation | 14205408 | 327175969 | 42.38 | -71.0956 | amenity | parking | parking | 5 | 250 | -1287.99 | 1526.96 |
| 2576 | relation | 15704864 | 10800012568 | 42.3551 | -71.1022 | leisure | garden | leisure | 5 | 500 | -1830.14 | -1244.56 |
In [36]:
colrs = distinguishable_colors(length(ix.measures), [RGB(0.1,0.2,0.4)])
class2col = Dict(ix.measures .=> colrs)
colrs
Out[36]:
In [37]:
m = flm.Map(tiles = "Stamen Toner")
line =0
function latlon(m::MapData,map_g_point_id::Int64)
osm_node_ix = m.n[map_g_point_id]
lla = LLA(m.nodes[osm_node_ix], m.bounds)
return (lla.lat, lla.lon)
end
for e in SimpleValueGraphs.edges(map_data.g)
# println(e)
flm.PolyLine( (latlon(map_data,e.src), latlon(map_data,e.dst)),
color="#a4f3a7", weight=2,
opacity=0.8).add_to(m)
end
for row in eachrow(df)
line += 1
info = "$(row.class):$(row.key)=$(row.value)"
k = findfirst(==(Symbol(row.class)), ix.measures)
flm.Circle((row.lat, row.lon), color="#$(hex(colrs[k]))",radius=row.points,
fill_color="#$(hex(colrs[k]))", fill_opacity=0.06, tooltip=info).add_to(m)
end
m.fit_bounds([(minimum(df.lat), minimum(df.lon)), (maximum(df.lat), maximum(df.lon))])
MAP_BOUNDS = [(map_data.bounds.min_y,map_data.bounds.min_x),(map_data.bounds.max_y,map_data.bounds.max_x)]
flm.Rectangle(MAP_BOUNDS, color="blue",weight=2).add_to(m)
m.fit_bounds(MAP_BOUNDS)
m
Out[37]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [28]:
ix.measures
Out[28]:
8-element Vector{Symbol}:
:education
:entertainment
:healthcare
:leisure
:parking
:restaurants
:shopping
:transport
In [29]:
p = plotmap(map_data; width=900, height=600)
Out[29]:
In [30]:
using StatsBase
cellsize = 100 # meters
xwidth = maximum(df.x)-minimum(df.x)
ywidth = maximum(df.y)-minimum(df.y)
nrow = round(Int, ywidth / cellsize)
ncol = round(Int, xwidth / cellsize)
#data = Array{Float64}(undef, length(ix.measures), nrow, ncol);
xmin = minimum(df.x)
ymin = minimum(df.y)
attdf = DataFrame()
for i in 1:nrow
for j in 1:ncol
x = xmin + cellsize*(j - 0.5)
y = ymin + cellsize*(i - 0.5)
enu = ENU(x,y) # ENU in mapdata coordinates
lla = LLA(enu, map_data.bounds)
enu2 = ENU(lla, ix.refLLA) #ENU in sptatial indec coordinates
att = attractiveness(ix, enu2, +)
for measu in ix.measures
a = att[measu]
push!(attdf,(;measu, x,y,a))
end
end
end
attdf
Out[30]:
172360×4 DataFrame
172335 rows omitted
| Row | measu | x | y | a |
|---|---|---|---|---|
| Symbol | Float64 | Float64 | Float64 | |
| 1 | education | -4545.5 | -3980.21 | 138.46 |
| 2 | entertainment | -4545.5 | -3980.21 | 0.0 |
| 3 | healthcare | -4545.5 | -3980.21 | 0.0 |
| 4 | leisure | -4545.5 | -3980.21 | 0.0 |
| 5 | parking | -4545.5 | -3980.21 | 0.0 |
| 6 | restaurants | -4545.5 | -3980.21 | 0.0 |
| 7 | shopping | -4545.5 | -3980.21 | 0.0 |
| 8 | transport | -4545.5 | -3980.21 | 0.0 |
| 9 | education | -4445.5 | -3980.21 | 141.379 |
| 10 | entertainment | -4445.5 | -3980.21 | 0.0 |
| 11 | healthcare | -4445.5 | -3980.21 | 0.0 |
| 12 | leisure | -4445.5 | -3980.21 | 0.0 |
| 13 | parking | -4445.5 | -3980.21 | 0.0 |
| â‹® | â‹® | â‹® | â‹® | â‹® |
| 172349 | parking | 10754.5 | 9819.79 | 0.0 |
| 172350 | restaurants | 10754.5 | 9819.79 | 0.0 |
| 172351 | shopping | 10754.5 | 9819.79 | 0.0 |
| 172352 | transport | 10754.5 | 9819.79 | 3.04296 |
| 172353 | education | 10854.5 | 9819.79 | 0.0 |
| 172354 | entertainment | 10854.5 | 9819.79 | 0.0 |
| 172355 | healthcare | 10854.5 | 9819.79 | 0.0 |
| 172356 | leisure | 10854.5 | 9819.79 | 0.0 |
| 172357 | parking | 10854.5 | 9819.79 | 0.0 |
| 172358 | restaurants | 10854.5 | 9819.79 | 0.0 |
| 172359 | shopping | 10854.5 | 9819.79 | 0.0 |
| 172360 | transport | 10854.5 | 9819.79 | 4.47852 |
In [31]:
scale(col::RGB{Float64}, perc) = RGB(col.r+(1-col.r)*(1-perc), col.g+(1-col.g)*(1-perc), col.b+(1-col.b)*(1-perc))
Out[31]:
scale (generic function with 1 method)
In [32]:
p2 = deepcopy(p)
poiclass = :restaurants
points = attdf[attdf.measu .== poiclass, :]
points.z .= points.a ./ maximum(points.a)
#p = scatter!(points1, color="#$(hex(colrs[k]))", ratio=1, title=string(i), size=(800,600))
# for j in 1:length(atts)
# p = scatter!(pts2[j], color=:darkblue, alpha = 0.05, markersize = atts[j] / 30 )
# end
scatter!(p2, points.x, points.y;fillalpha=0.5,markershape=:rect, markeralpha=0.42,markerstrokewidth=0, markercolor=scale.(Ref(class2col[poiclass]), points.z))
p2
Out[32]: